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Abstract 

Although there are many methods for functional data analysis (FDA), little emphasis is put on 
characterizing variability among volatilities of individual functions. In particular, certain individu- 
als exhibit erratic swings in their trajectory while other individuals have more stable trajectories. 



c3 

j i . 

There is evidence of such volatility heterogeneity in blood pressure trajectories during pregnancy, 



for example, and reason to suspect that volatility is a biologically important feature. Most FDA 
models implicitly assume similar or identical smoothness of the individual functions, and hence 
can lead to misleading inferences on volatility and an inadequate representation of the functions. 

o: 

. We propose a novel class of FDA models characterized using hierarchical stochastic differential 
£vq | equations. We model the derivatives of a mean function and deviation functions using Gaussian 
processes, while also allowing covariate dependence including on the volatilities of the deviation 

^ ■ 

functions. Following a Bayesian approach to inference, a Markov chain Monte Carlo algorithm is 
used for posterior computation. The methods are tested on simulated data and applied to blood 
pressure trajectories during pregnancy. 
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1 Introduction 



Multi-subject functional data arise frequently in many fields of research, including epidemiology, 
clinical trials and environmental health. Sequential observations are collected over time for multiple 
subjects, and can be treated as being generated from a smooth trajectory contaminated with noise. 
There are a rich variety of methods available for characterizing variability and covariate dependence 
in functional data ranging from hierarchical basis expansions to functional principal components 
analysis (FPCA). In defining models for functional data and in interpreting variability in trajec- 
tories, either unexplained or due to covariates, the emphasis has been on differences in the level 
and local trends. Dynamic features, such as velocity, acceleration and especially volatility, are also 
important but receive much less attention. 

Analysis of functional data dynamics studies temporal c hanges in trajectories and effects of 



covariates on these changes. For example, 



Wang et al. 



(120081 ) used linear differential equations to 



model price velocity and acc eleration in eBay auctions and explored the auction subpopulation 



effect. 



Miiller and Yaol (120101 ) modeled the velocity of online auction bids using e mpirical sto c hastic 



Zhu et al 



(2011) 



differential equations with time-varying coefficients and a smooth drift process, 
inferred the rate functions of prostate-specific antigen profiles using the proposed semiparametric 
stochastic velocity model, in which rate functions are regarded as realizations of Ornstein-Uhlenbeck 
processes dependent on covariates of interest. 

This article investigates a different dynamic feature, the volatility, which measures the condi- 
tional variance of trajectory changes over an infinitesimal time interval. We propose a stochastic 
volatility regression (SVR) model with Gaussian process (GP) priors used for the group mean and 
subject specific deviation functions through stochastic differential equations (SDEs). We further 
accommodate inference on covariate effects on volatility through allowing the diffusion term of SDEs 



for deviation functions to depend on covariates. Al though yolati 



through stochastic volatility (SV) mod e 



2005 



Barndorff-Nielsen and Shephardl . 



sin finance (Heston 



1993 



i ty has been extensively studiec 



Jacquier et al. 



2002 



Shephardl . 



20121 ). the setting, model specifications and data features 



are distinct from ours. SV models typically focus on a single volatility process which is time- varying 
and associated with a price process for high-frequency fina nce data. Mo r e rele v ant is the literature 

(2011) 



on mu l tivariate SV models; for re cent references, refer to 



(120111 ). Ilshihara and Omoril (120 12f ) and 



Durante et al. 



Loddo et al. 



(120121 ). 



Van Es and Spreij 
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This setting differs from ours in that the focus is on multivariate time series modeling instead 
of functional data analysis, with interest in the joint volatility dynamics over time for the different 
assets. In contrast, we are interested in studying variation across individuals in a time-constant 
subject-specific volatility; that is, certain subjects may have very smooth trajectories while other 
subjects have erratic trajectories. It is our conjecture that such volatility heterogeneity is common 
in biomedical settings, but is overlooked in analyzing data with models that implicitly prescribe 
a single level of smoothness for all subjects. As data are sparse and irregularly spaced in most 
studies, it is not surprising such behavior is overlooked. However, the volatility in a biomarker may 
be as important or more important than the overall level and trend in the biomarker. We provide 
motivation through the following longitudinal blood pressu re data set. 



The Healthy Pregnancy, Healthy Baby Study (HPHB, iMiranda et al.l . 120091 ) collected longitu- 
dinal blood pressure (BP) measurements for pregnant women. Blood pressures are measured at 
irregularly spaced times during the second and third trimesters with the number of measurements 
per subject varying from 9 to 19. We are interested in estimating subject-specific volatilities of 



BP trajectories and in identifying covariates associated with the volatility. Figure Q (a) plots mean 
arterial pressure (MAP) trajectories for twenty randomly selected normal women and women with 
preeclampsia, respectively. Clearly the MAP trajectories among the preeclampsia group are more 
wiggly than the ones in the normal group, which is also implied by boxplots of log-transformed em- 
pirical volatilities in Figure QjfbJ To explore volatility differences among various groups in addition 



to preeclampsia, we apply normal linear regression for log-transformed empirical volatilities with 
the covariates race, mother's age, obesity, preeclampsia, parity and smoking. The results suggest 
that preeclampsia and smoking (p-values 0.0005 and 0.002) are associated with empirical volatility. 
This is a two-stage approach, which is useful as an exploratory tool but ignores measurement errors 
and uncertainty in volatility estimation. 

[Figure 1 about here.] 



Additionally, empirical volatilities in Figure njfbj are heterogeneous even within the normal or 



preeclampsia group. This heterogeneity will be largely omitted when we apply FDA methods with 
identical or similar smoothness for individual functions within a group. Consequently, the wiggly 
trajectories will be over-smoothed while the smooth trajectories will be under-smoothed. We can 
potentially estimate the individual trajectories separately but it is well known that borrowing of 



information will dramatically improve performance for sparse functional data. In addition, separate 
estimation does not allow for inferences on covariate effects and unexplained variability in volatility. 

As for the clinical question addressed, the previous FDA methods mainly focus on the shift 
of blood pressure level and ignore examining the volatility of blood pressure, which measures the 
haemodynamic stability and is crucial for cardiovascular health. For example, a recent study shows 
that blood pressure stability rath er than blood pressure level is associated with increased survival 
among patients on hemodialysis ( jRaimann et all 120121 ) . For the HPHB study, we observe that 
preeclampsia is commonly accompanied by blood pressure over-swinging. The joint effect of high 
blood pressure level and large volatility may lead to the adverse birth outcomes, such as low birth 
weight and preterm birth. 

The remainder of the article is organized as follows. Section [2] specifies the SVR model and 
discusses its properties. Section [3] develops an efficient Markov chain Monte Carlo algorithm for 
posterior inference. Section H] presents simulation studies and the proposed method is applied to 
a real dataset in Section [5j Finally, section [6] contains concluding remarks and future possible 
extensions. 



2 Stochastic Volatility Regression Model 
2.1 The Model Specification 

Suppose that Yi(t), % = 1,2, ... ,m, is the observation of the ith subject at time t G 71 = 
ti,2, 4 4 • i~ki,m < tu} with Ti the set of observation times before time tjj for the ith subject. 
We specify an observation equation for Y{(t) as 

Y i (t) = M k .(t) + U i (t) + e i (t), (1) 

where Yi(t) is contaminated by the measurement error Si{t) following a one-dimensional normal 
distribution with mean and variance o 2 e . Assuming the ith subject belongs to the fcjth group (e.g. 
by race or treatment) with fcj G {1, 2, ••• ,g}, we include a fcjth group mean function M^it) = 
E{Yi(t) | Mk^t)} in the observation equation. In addition, the trajectory of the ith subject will 
unlikely coincide with (t) and therefore the departure from (t) is addressed and represented 
by the subject-specific deviation function Ui(t) with E{[Zj(t)} = 0. 
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The volatility of the zth subject is defined as the conditional variance of the (q — l)th or- 
der derivative of Ui(t) over an infinitesimal time interval. Namely, we denote the volatility a\j. = 



lim h 1 E 

h^-0 



with differential operator D q = Asvolatil- 



dti ' 



{D q ~ l Ui(t + h)- D q ~ l Ui(t)} 2 | D q - l Ui{t) 
ity approaches zero, Ui(t) would be a roughly flat line. In contrast, increasing the value of volatility 
would lead to a more wiggly £/j(t) with a larger magnitude of fluctuation around M k .(t). 

We specify Gaussian process priors for M ki (t) and Ui(t) using SDEs which incorporate the group 
and individual volatilities a 2 Mk and a^. : 

D*>M ki {t) = a Mki W ki {t), (2) 
D*Ui(t) = cruW'it), (3) 

where p, q G N > 1 and <Ju k ., &Ui e 1R + ; Vt^i) and W"/(i) are independent Gaussian white 
noise processes with E{W k .(t)} = E{W((t)} = and covariance function E{W ki (t)Wk i (t')} = 
E{W((t)W((f)} = 8{t-t'), a delta function. We denote M hi0 = {M fe .(0), D l M ki {$), ■ ■ ■ , £> p_1 M fci (0)} 
and C7jo = {£/j(0), -D 1 ?7i(0), . . . , -D g_1 £/i(0)} as the initial values of M k .(t) and Ui(t) as well as their 
derivatives till orders q — 1 and p — 1 respectively. The volatility cr^. in SDE ([3]) is allowed to vary 
between subjects and across covariates. In this article, we focus on a simple transformed mean rela- 
tionship, namely log(cr^) ~ Ni(a^/3, cr 2 ), which can be extended to the more complex specifications 
with less restrictive assumptions and to high- dimensional covariates. 

The mean and covariance functions of Gaussian process priors for M k .(t) and Ui(t) can be 
obtained by applying stochastic integration to SDEs (T5J) and ([3]), resulting in the following lemma. 

Lemma 1 M k (t), k = 1,2, ... ,g, and Ui(t), i = 1,2, ... ,n, are the summations of mutually in- 
dependent Gaussian processes written as M k (t) = M k0 (t) + M k i(t) and Ui(t) = Uio(t) + Unit) 
with corresponding mean functions E{M k0 (t)} = E{M k i(t)} = E{Uio(t)} = E{Un(t)} = and 
covariance functions 

9-1 

^M k0 (s,t) = a 2 Mo n Mo {s,t) = a 2 Mo ^2<f>i(s)(f)i(t), 

1=0 



^M kl (s,t) = a 2 Mh K Ml {s,t) = a 2 Mk J G g (s,u)G, 



,(t, u)du, 



p-i 



JC Uw (s,t) = al o TZ Uo (s,t) = a 2 Uo ^0 { (s)0i(t), 



1=0 



fcua (s, t) = crfj.TZ^ {s, t) = <j 2 v , J G p (s, u)G p (t, u)du, 
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respectively, where <j>i\t) = -jj, G q (s,u) = ^fjjj an & s,t,u E T = [0, %]. 

Hence, we can represent the prior of M^(t) + Ui(t) as a hierarchical Gaussian process, 

M fe .(t) + | M fei (t) ~£P(M fci (t),/C^jM) + /WM)), 
M fci (t) ~ 0P(O, )C Mh0 (s, t) + /C A4l (s, t)), 

where QV(M(t), JC(s, t)) denotes a Gaussian process with mean function M( t) and covariance func - 



tion K.(s,t). Different from the previous hierarchical Gaussian process prior (IPark and Choil . 120101 ). 
in which the covariance function is modeled as a squared exponential kernel and is identical across 
the subjects within a group, here /Q/ i0 (s, t) +)Cu il {s,t) is subject specific and depends on covariates 
through afj.. 

To carry out Bayesian inference, we further specify the following prior distributions for the 
hyperparameters. In particular, M^o ~ N p (0, erf, /() J) with a\ Q = 10 4 , Uio ~ N g (0,cr^ o 7), a 2 ~ 
invGamma(a, b), a\ lk ~ invGamma(a, b) and afj o ~ invGamma(a, b), where invGamma(a, b) denotes 
the inverse gamma distribution with shape parameter a and scale parameter b. The (3 and a 2 follow 
the independent Jeffreys' prior, f((3,a 2 ) oc l/cr 2 . 

2.2 Double-Penalized Smoothing Spline 

It is well known that the Baye s estimate wit h the integrated Wiener process prior is identical to 



the smoothing spline estimate (IWahbal . Il990l ). By similar arguments, we can show that when the 
volatilities are given and <j 2 Mq and a^ o go to infinity, the posterior means of Mfc(t) and Ui(t) are 
equivalent to the double-penalized smoothing spline Mk(t) + Ui(t), which is the minimizer of the 
double-penalized sum-of-squares, 



m 1 n, 

DPSS = E - E - m *m - u ^ 2 + ( 4 ) 

i=i 1 j=i 

g „ -m „ 

VA A4 / {D?M k (t)} 2 dt + VA,, / {D*Ui(t)} 2 dt, 
k =i J r i=l Jt 

where penalty terms j T {D p Mk(t)} 2 dt and j T {D q Ui(t)} 2 dt penalize the roughness of Mfc(i) and 

Ui(t) respectively, where the smoothness and the fidelity to data are balanced by the smoothing 

parameters \j^ k = — | — and A^ = n ^ . Expression for Mfc(t) and Ui(t) can be obtained 

,:/,, /, " ,a u>: ni ° Vi 
explicitly, as shown in the following Theorem. 



Theorem 1 The smoothing splines M k (t) and Ui(t) with t G T minimize the double-penalized 
sum-of-squares (J1J and have the forms 

p—1 n 

M k (t) = J2^i4>i(t) + J2 u ^ Ml (tj,t) = A*fc^ M (*) + f'fciw*) 

1=0 i=i 

q—1 rii 
1=0 j=l 

where fi k = {fj, k0 , fi kU ■ ■ ■ , /i fc ( P -i)}', ^fe = Wki, v k 2, • • • , v kn )' , «i = {a«i> • - • , Oifa-i)}' 
7i = (Ta>7i2) ■ ■ • ,7mJ' o^e i/ie coefficients for the bases 

<t> fl {t) = {<p {t)Ai{t),--- ,<f>p-i(t)}', R Ml (t) = {n Ml (t 1 ,t),n Ml (t 2 ,t),--' ,n Ml {t n ,t)y, 

<t> a {t) = {0 o (i), 0i (*),••• A q -i(t)Y, Ru*(t) = {KuAtn^KuAta,*),--- ,K Vl (t ini ,t)Y, 

with tjET m = \J™ = {Ti = {tj, j = 1, 2, . . . , n}, the index set of unique observation times among all 
m subjects. 

Given M k {t) and Ui(t), the double-penalized sum-of-squares (j3J) can be written as 

m j 

DPSS = ~i Y i ~ ^i^Vkt ~ AiRM^ki - - R Un~fi)'x (5) 

i=l Ui 

(Yi - Ai^/ife. - AiR Ml Uki ~ 4> ai (Xi - R Un j i )+ 

9 m 

k=l i=l 

where 

Yi = {Y(tii), Y(t i2 ), ■ • • , Y(t in -)Y, Aj = (Sjj/) n . xn , 

<f>ii = {4*^ 1 ),4> ft (t 2 ),"- R Ml ={R Ml (ti),R Ml (t2),--- ,R Ml (t n )}, 

0a, = {<t>a(til), a (*i2), ■ • • ,<t>a.{Un,)} , Ru n = {Ru a (til) , Ru a (U2), " • • , -Rf/a (*in< ) } 

wii/i <5jj/ = 1 i/it/i subject has an observation at time Uj = tj>, Uj G %, ty G 7^ and Sjf = 0, 
otherwise. 

The proofs of Theorem [I] and the following Corollary are included in Appendix lAl 

Corollary 1 The fi k , v k , oti and^ i can be obtained through a backfitting algorithm or the Gauss- 
Seidel method, iterating the following two steps until convergence: 
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(a) for each i, & t = (^S^^-^S^Y, and 7l = {i - fc^S^J- 
where S Vi = R Vi i + n^uj and Y { = Y { - A^^p,^ - AiR Ml i> ki ; 



(b) for each k, fi k = (^A'5 M ^ A0J 1 ^A' S A }Y k and 
u k = 5i{l-A^(^A'^A^)-ViA'^}n ; where S 
( Yi - - Ru a %) andA= " A i A - 



ARm x + ^M k I, Yk 



% 1 Pc4 — A> 



3 Posterior Computation 

Although we can obtain M k (t) and Ui(t) by the backfitting algorithm outlined in C orollary [J and 



Wahba 



19901 ) . it is unclear 



estimate Xu k and \u t through generalized cross validation (Chap. 4, 
how to allow to depend on covariates. In addition, when n is large, it is computational infeasible 
to invert the n x n matrix SM k involved in the backfitting algorithm. Instead, we propose a Markov 
chain Monte Carlo (MCMC) algorithm for posterior computation that solves these problems. The 
algorithm achieves computational efficiency by leveraging on the Markovian propert y of SD Es and 
samples M k (t) and U.i(t) through the simulation smoother (IDurbin and Koopmanl . 120021 ). which 
requires the following Proposition. 

Proposition 1 Let X(t) denote a (r-l)th-order integral Wiener process, defined by the stochastic 
differential equation D r X(t) = W(t). Consequently, the Xj = {X(tj), D 1 X(t j ), ■ ■ ■ , D r ~ l X(tj)}' , 
j — 1, 2, • • • , n, follows a state equation 



X j + \ — GjXj + a; 



31 



where Gj = JJk=o f\ C and ~ K(0,Wj) with Wj = J J exp{C(^ - u)}DD' exp{C'(5j - 
u)}du, C = (cu') rxr , cur = 1 when V = I + 1 and c\y = otherwise, D = (0, 0, ••• ,1)' and 

The proof is in Appendix |A] Finally, we outline the proposed MCMC as follows. 



(1) Given M ki (tij), 0% and o~y., sample Ui(t 
Yi(tij t 



13 )i 



1,2, 



,m, j 



0,1, 



, rii. Let Yrj 



M k (tn) and the SVR model for the it 



1 sub 



space model (IJonesl . 



1993 



Durbin and Koopman 



ect can be expressed as the following state 



200ll ) , from which we can draw samples of Ui (t y - ) 
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and its derivatives using the simulation smoother. 

= F Ui3 Uij + e UiV 

where F Vij = (1,0,- ■■ ,0), U tj = {Ufa), D l Ufa), ■ ■ ■ , D^Ufa)}' and e Vij L ~ Hfaaf). 
Similar to the Gj, U3j and Wj in Proposition [T], the Gjj ip oji/.. and follow the same specifi- 

cations with r = q. 

(2) Given Ui(tj), a 2 and a 2 M , sample Mk(tj), k — 1, 2, • • • , j = 0, 1, • • • , n. Similarly, we rewrite 
the SVR model for the /cth group as the following state space model and then sample M ki (tij) an d 
its derivatives by the simulation smoother. 

Y M kj = F M k] M kj + £ Mkp 

M Hj+i) = G MkJ M kj + a Mk u MkP 

where Y Mk , = (Y^ fc .) mx i, M kj = {Mfa, D'Mfa, ■ ■ ■ , D^Mfa}' , F Mk . = (F* k .) mxp and 
£M k] = diag(e x Mk . , e 2 Mk . , ■ ■■ ,e™ Ikj ). When zth subject has an observation at time tj and ki = k, 
Y l MkJ = Y iih) - Ufa), F*J fc . = 1 and e\ hj ~ N^O,^ 2 ). Otherwise, = F« kJ = e^. = 0. The 
GM kj , &M h - and W ' M kJ are given by Proposition [T] with r = p. 

(3a) Given M ki (tij) and Ui(tij), i = 1,2, ••■ ,m, j = 1, 2, ••• ,n», sample a 2 from the posterior 
distribution invGamma (a + \ Ya=i XXi _ M kfa) ~ Ufa)} 2 ^) . 

(3b) Given £/ i0 , sample cr^ from the posterior distribution invGamma (a + ^j 2 , ft + | 2\^o ^o^«o) • 

(3c) Given iVfjy, sample cr^ from the posterior distribution 

invGamma (a + f , b + \ YT^M k[3+l) - G Mk .M hj )'W^ k , (M k{j+1) - G Mk .M kj )) . 

(3d) Given U ^, (3 and a 2 , sample afj. using a Metropolis-Hasting algorithm. We choose afj. ~ 
invGamma(a, b) as the proposal prior distribution and a proposal a"jj_ can be easily drawn from 
invGamma (a + ^f,b + \ X^o^^O'+i) - G u l3 U ij )'Wu\(U i ( j+1} - Gu^Uij)) the corresponding 
proposal posterior distribution. The a 2 ^ will be accepted with the following probability and dis- 
carded otherwise with afj. unchanged, 



min 





x'A a 3 ) Y[7=o MU«j+i) ~ Gu^j 




aufai) 








a ufa t ) 
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where /ln, fa and fa denote the log-normal, g-dimensional normal and inverse gamma probability 
density functions respectively with a Vi = a+^f-, b Ui = 6+| YJj=o ( u i{j+i)~ G u l3 U i^'W^U 

(4) Given ofr., sample (3 and a 2 . Let Z = (log a 2 ^ , log aj) 2 , • • ■ , loga 2 ^)', /§ = (X'X) _1 X'Z and 
a 2 = ^ Z ~ X ^llk~ X ^ ■ We draw r from Chi-squared distribution with m — k degrees of freedom and 
set a 2 = (m ~ fc) * 2 and then sample /3 from N m ^(X'X)- 1 ) . 



4 Simulation 

We carry out two simulation studies to evaluate the performance of the propose d method and 



compare it to alternative methods includ ing nature cubic spline (NCS, IWahbal . Il990l ) and functional 



principal components analysis (FPCA, lYao et all 120051 ) . The comparison focuses on performance 



in estimating the trajectory M^(t) + Ui(t), the volatility a 2 }, and the coefficients f3. 

The first simulation study is designed to investigate the consequence of ignoring either sim- 
ilarity or heterogeneity of volatilities when they are present. One hundred replicated datasets, 
each consisting of 100 trajectories, are sampled from the SVR model, in which the log-transformed 
volatilities are varying and normally distributed. More precisely, we choose (3 = (0, 0.6, 2)' and 
Xi = (1, Xjx, Xj 2 )' with Xn and Xi2 sampled from xn Bernoulli(0.4) and Xi2 1^(0, 0.25) re- 
spectively. Given f3 and x iy volatilities cr^'s can be drawn from log(a 2 /i ) ~ Ni(a^/3, 1). Along 
with a 2 M = a 2 M2 = 10, a 2 = 1, p = 2 and q = 1, Mi(t), M 2 (t), Ui(t) and Ei{t) are sampled at 
t G {0.2,0.4, ••• ,4} from equations (T5]) and ([3]) and the distribution of measurement error £i{t). 
Twenty percent of samples are removed completely at random, resulting in on average 16 unequally 
spaced observations per subject. Finally, the ith subject is randomly assigned to one of the two 
groups with equal probability and Yi(t) is obtained from observation equation (TJ|). 

The simulated datasets are analyzed by three methods, SVR, NCS and FPCA. We first apply 
the SVR approach, using the proposed MCMC algorithm with 15,000 iterations and keeping every 
5th of the last 10,000 samples for posterior analysis. It takes about 80 minutes on a PC with 
2.33GHz Intel(R) Xeon(R) CPU. Posterior means are chosen as the estimates of Mj^if) + Ui(t), afj. 
and (3. Additionally, the trajectories M k .{t) + C/j(t)'s are estimated by NCS for one subject at a 
time, and by FPCA for all subjects within a group and separately by the group, taking about 1 



10 



minute and 2 minutes on the same PC respectively. For NCS and FPCA methods, we may also 
estimate covariate effects on volatility through a two-stage method: estimating empirical volatility 
by — y^lT 1 Ut -^ j n the first stage with Oj,- the estimate of UAt) at time £j,-: and in the 

second stage, empirical volatilities are regressed on covariates to obtain the estimate of (3. 

For each simulated dataset, we calculate average squared error (ASE) for the trajectory ASE(M+ 
U) = i YZi h E;=i {MkMj) + UiiUj) - Af^fy) - ASE for log volatility ASE{%(^ )} = 

— Y^iLx [}°9{°u) ~ ^°5 , ( <7 L r ,)} 2 ) an d squared errors (SE) for coefficient estimates SE($) = 0i—(3i) 2 , 
I = 0, 1,2. Table □ reports means of ASE(M + U), ASE{log(afj)} and SE(#) across 100 replicate 
datasets. MASEs and MSEs by NCS and FPCA approaches are significantly inflated, for exam- 
ple, being doubled and tripled in MASE(M + U) respectively, compared to SVR. We randomly 
select a data set and take a close examination. We calculate the individual ASE of the trajectory 
7T~ Sjli + Ui(tij) — M k . (tij) — Uiitij) | and select the top four subjects with the largest 

individual ASEs with respect to NCS and FPCA approaches respectively. 

Figure [2] shows estimates of the trajectory for six subjects. The plots reveal that the increased 
MASEs or MSEs by NCS and FPCA are caused by different reasons. NCS approach, treating 
one trajectory a time, ignores the similarity between the subjects within a group, leading to over 
fitting true trajectories (e.g. Figure WA) and[2](e)) with both over and under estimated volatili- 
ties. On the other hand, FPCA approach omits the heterogeneity of the subjects within a group; 
inflated MASE(M + U) are mainly contributed by a few subjects whose trajectory fluctuates with 
significantly higher volatility but is overly smoothed (e.g. Figure EJ^b) and[2](d)); and under the 
assumption of similar smoothness, the estimates of volatility are largely under estimated. Although 
this simulation study is in favor of SVR approach by design, the scenario we consider is neverthe- 
less realistic in practice and the simulation results reveal the drawbacks of omitting similarity or 
heterogeneity of volatilities by alternative approaches. 

Our second simulation study is conducted to evaluate the performance of SVR, NCS and FPCA 
when volatilities are homogeneous with no covariate effects. As in the first simulation study, 100 
replicate datasets are generated, each consisting of 100 trajectories at t G {0.2,0.4, • • • ,4}; twenty 
percent of data points are deleted completely at random; and subjects are assigned to one of two 
groups with equal probability. The observations are generated from Yi(t) = 10{t + sin(t)} + 
0.6aijCos(7rt/10) + 0.2a 2 iSm(7rt/10) + Eiit) for subjects in the first group and from Yi(t) = 10{t + 
cos(t)} + 0.5aijcos(7rt/10) + 0.3a2iSm(7rt/10) + eiit) for the ones in the second group, with an '~ ' 

11 



Ni(0,4), a 2i Ni(0, 1) and e l {t) ^& Ni(0, 1). The SVR, NCS and FPCA approaches are applied 
and MASE(M + U) is presented in Table [U in which SVR and FPCA approaches show close 
MASE(M + U), both smaller than the one by NCS approach. This suggests that SVR is no worse 
than FPCA for the cases with homogeneous volatilities. 

In short, through the two simulation studies, we demonstrate that SVR achieves substantial gains 
in terms of reducing the ASEs or SEs of the estimates of the trajectory, volatility and covariate effect 
when volatilities are heterogeneous, and works at least as well as FPCA approach when volatilities 
are homogeneous. 

[Table 1 about here.] 
[Figure 2 about here.] 



5 Applications 

It is a common practice to monitor the blood pressure of pregnant woman during pregnancy. Despite 
the trend of blood pressure being well studied, its fluctuation has been little addressed and the 
factors associated with the fluctuation are largely unknown. In this analysis, we apply the proposed 
SVR approach to analyze longitudinal blood pressure measurements in HPHB study, aiming to 
investigate the stability of blood pressure trajectories and identify the associated factors. The data 
set consists of 106 non-Hispanic white (NHW) and 176 non-Hispanic black (NHB) women whose 
first blood pressure measurement is collected before the 16th week of gestation and the last one no 
earlier than the 37th week of gestation. Most of subjects have 9 (35.10% of them), 10 (29.28%) or 
11 (14.98%) measurements spaced at irregularly times. The covariates we focused on include race 
as NHW vs NHB, mother's age group as age > 35 vs age < 35, obesity as yes vs no, preeclampsia 
as yes vs no, parity group as parity > vs parity = 0, and smoking as yes vs no. We run the 
proposed MCMC algorithm for 15,000 iterations, discard the first 5,000 and retain every 5th of 
the remaining samples for posterior analysis. The trace plots and autocorrelation plots suggest the 
algorithm converges fast and mixes well. Posterior summary of selected parameters is presented in 
Table H 



The panels (a) and (b) of Figure [3] show posterior means and 95% credible intervals of the 



average blood pressure for NHW and NHB groups respectively, which share a common pattern: 
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decreasing till the late stage of the second trimester (about 20 to 25 weeks) and then increasing 
toward the pre-pregnancy level. Within the ethnic group, significant heterogeneity exists in terms 
of the stability of the blood pressure trajectory at the individual level. As Figure [3] |(c)| indicated, 
posterior means of volatility vary from -0.5 to 2 in the logarithmic scale, suggesting some subjects' 
trajectories are parallel to the group blood pressure trajectory with very small volatilities while 
others may significantly depart from the group blood pressure trajectory. 

Most interesting, we find that obesity and preeclampsia are associated with blood pressure 



volatility, with their 95% credible intervals not covering zero in Figure [3] (d) This implies that 
pregnant women with obesity and/or preeclampsia are more likely to demonstrate irregular patterns 
of blood pressure relative to their ethnic group. We further examine the characteristics of these 
subjects with extreme volatilities (results not shown). Among the top eight subjects presenting 
with the largest volatilities, most of them are NHB with obesity and preeclampsia, do not smoke 
and give birth to a baby for the first time; half of them are younger than 35. For the eight subjects 
with the smallest volatilities, they are surprising homogeneous, i.e. all of them being NHW (except 
one) without obesity and preeclampsia, younger than 35, not smoking and giving birth to a baby 
before. 

[Table 2 about here.] 
[Figure 3 about here.] 



6 Discussion 

We have proposed a stochastic volatility regression model to investigate the volatility and its associ- 
ation with covariates for multi-subject functional data. As an important dynamic feature, volatility 
measures the stability of the biological process. The analysis of volatility not only reveals its hetero- 
geneity among subjects but also its dependence on the covariates of interest. Complementing the 
current FDA methods which mainly focus on the trend of trajectory and its derivatives, the SVR 
method initiates the exploration of stability of functional data. As illustrated with the blood pres- 
sure data, our view is that substantial new insights can be obtained in a rich variety of biomedical 
applications by studying volatility. 
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A Appendix: Proofs of Theoretical Results 



A.l Proof of Theorem [T] 



By the RKHS theory of the polynomial smoothing spline ( iWahbal . Il990l . Section 1.2), we have 

p— 1 n 

M k (t) = M fc o(t) + M fc i(t) = J^VkiMt) + J2 "kjKMifat) + VMi(t), 

1=0 j=l 

and 



{LPM k (t)} 2 dt = u' k R Ml u k + (v Mi (-),Vm,(-))h Kmi , 

where M (t) G Hn Mo and M x {t) G Hn Ml , the RKHSs Hn Mo = {/(*) : D p f{t) = 0,t G T} and 
Hn M = {f(t) '■ D l f{t) absolutely continuous for i = 0, 1, • • • ,p — 1, D p f(t) G £2(7")}; TZm (s, t) and 
T^Mi( s ,t) are reproducing kernels defined in Lemma [TJ r] Ml (-) G Hkm 1 * s orthogonal to 7?-Mi(^v) 
with inner product (K Ml {tj, 0>^i(0)«ti m = St DP ^M l {tj,u)D p r] Ml (u)du = for j = 1, 2, • • • , J; 
£2(7") = {/(£) : Jt-/ 2 (^)^ < 00} is the space of squared integrable functions defined on index set 

r. 

Similarly, 

9-1 

— > 

=0 

and 

J {D*Ui(t)} 2 dt = j'.Ru^ + W-)>^ t 



Ui(t) = U i0 (t) + U a (t) = ^auUt) + X^Tij^fei,*) + 7] Ua {t), 
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where U i0 (t) G Hk Uq and U a {t) G H nUl with Hk Uq = {/(*) : = 0,t G T} and ^ = 

{/(i) : D l f{t) absolutely continuous for i = 0, 1, • • • , q - 1, £> 9 ,/(t) e £ 2 (T)} the RKHSs with re- 
producing kernel lZu (s,t) and TZui(s,t) defined in Lemma [U r)u a {-) G H-r.^ is orthogonal to 
TZu^tij, ■) for j = 1, 2, ■ • - ,m. 

Hence, the double-penalized sum-of-squares (TjJ can be written as 



DPSS = ~( Y i ~ AittJtki ~ ^iR-Mi^ki ~ <t> ai <*i ~ Runli)' 

i=l Ui 

(Yi - A^/z^ - AiR Ml v h . - (jja.ati - R Uil j i )+ 

g m 

YI x My k R M^k + Y x u i i' i Ru a i i + 

k=l i=l 
3 m 

J] A ^4 (Wl (')' VM\ (")) + X Af/ > (" Wil W 



fe=l i=l 

which is minimized at (VM 1 (-),VM 1 (-))n nMi = {VuA^iVuA^Hn^ = °> leading to ^(-) = ^ = 0. 
A. 2 Proof of Corollary U 

Taking partial derivatives of double penalized sum-of-squares in Corollary [1] with respective to \x kl 
i/fe, CKj and 7j respectively and setting them to zeros, we have 

9 DPSS 



DPSS x 

~ = X —RhiA'ii^^ + AiR Ml u hi + (f) a .oci + Ru a li - Yi) + X Mk R Ml u k = 0, 



9 DPSS 1 

o = — ^(Ai^Mfc, + ^iRM^k, + + HoixTi - ^i) = °> 

9 DPSS 1 

— ~ = —Runi^^ki + ^iRMiVki + <t> ai a i + Runli ~ Y i) + X Ui R Uix ^ i = 0, 

which lead to 

^A^fc + * A %^ = ( 6 ) 
Jl Ml A0 M // fe + (.RmiA + X Mk I)R Ml u k = R Ml Y k , (7) 

fia^a^i + 4>' ai Ru a li = 4> a Yi, (8) 

Ru a <l> ai oii + (Run + riiXu^Ru^ = R Uu Yi, (9) 
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with Y k = — A 'i ( Y i ~ <*i - R u a li), Yi = Y i -A i <l> li fJi ki - A i R Mi t 'k i and A = ^ — A^Aj. 
The solutions of c*j and ^ in the step (a) can be obtained from equations and (jH]), while the 
solutions of fi k and v k in the step (b) from equations (JBj) and (J7|). 

A. 3 Proof of Proposition [I] 

Based on the SDE D p X(t) = W{t), we have 

D l X(t) = CX(t) + DW(t), 

where X = {X(t), D 1 X{t), ■ ■ ■ ,DP- 1 X(t)}', C = {c iV ) pKp , c w = 1 when i! = i + 1 and c«' = 
otherwise, and £> = (0, 0, • • • , 1)'. 
It follows that 

X j+1 = exp(C5j)Xj + / exp{C(5j - u)}DW(tj + u)du 

Jo 

= G j X j + 

where Gj = exp(Cdj) = J2l=o %C k and ~ N p (0, Wj) with 

Wj= I 3 exp{C(5j - u)}DD' exv{C'{5j - u)}du 
Jo 

as required. 
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Figure 1: (a) Mean arterial pressure (MAP) trajectories for twenty randomly selected normal women 
and women with preeclampsia; (b) Log-transformed empirical volatilities for women in the normal 
group and preeclampsia group. Yij denotes blood pressure for the ith woman at time t^, and 
Uij = — Yj is the deviation from the group mean blood pressure Yj. The empirical volatility 
measures the fluctuation of trajectories empirically and is defined as ^- YTj=\ 
the number of observations for the ith woman. 
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Figure 2: The plots of observation (o) and trajectory at time (x), as well as esti- 
mates of trajectory M h (t) + Ui(t) by SVR (— ), NCS ( ) and FPCA (■ ■ ■) approaches, 

for six subjects in one simulated dataset with the largest individual average squared errors 

i e;=i {M ki {Ui) + utfa) - M h {u 3 ) - u^)} 2 . 
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(c) (d) 

Figure 3: The posterior means and 95% highest posterior density (HPD) credible intervals for (a) 
the blood pressure during the 2nd and 3rd trimesters for non-Hispanic white group; (b) the blood 
pressure during the 2nd and 3rd trimesters for non-Hispanic black group; (c) the volatility in the 
logarithmic scale; (d) covariate effects. 
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Table 1: The mean of squared errors or average square errors of the estimates of trajectory, volatility 
and covariate effect across 100 replicate datasets . 









Case I 






Case II 


method 


M + U 


logK) 


A> 


Pi 




M + U 


SVR 


0.345 


0.614 


0.043 


0.081 


0.075 


1.122 


NCS 


0.609 


1.297 


0.089 


0.165 


1.724 


1.477 


FPCA 


1.099 


2.966 


1.144 


0.185 


1.969 


1.185 
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Table 2: Blood pressure data: Posterior summary of parameters in the SVR model. 



Parameter 


Mean 


Mode 


SD 


95% HPD invteral 


-I 


17.807 


17.818 


0.694 


[16.389, 19.106] 




0.236 


0.187 


0.181 


[0.040, 0.556] 




0.204 


0.162 


0.148 


[0.042, 0.472] 


< 


46.729 


46.412 


4.776 


[38.263, 56.619] 


a 2 


0.734 


0.741 


0.333 


[0.082, 1.295] 
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